The present state-run of the fine art in the data
linkage is to match the entities from the different data sources
which do not contain the common identifier. In that here one-tomany
data linkage is considered to obtain the decision process
based on the clustering tree. In prior work, there is no one-tomany
data linkage tasks instead the issue addressed are to link
among the same type of the entities. In this paper, two new
splitting criterion are introduced to enhance the performance of
the linkage process for the best split at each node during the
decision tree construction process and securing the linked data
from the unauthorized usage. Pruning techniques are
implemented to remove the anomalies of the clustering tree.
Sathya. T : ME Student, Department of Computer Science and Enginerring
KSR College of Engineering, Anna University,
Namakkal, Tamilnadu 637215, India
Nithya. K : Assistant Professor, Department of Computer Science and Enginerring
KSR College of Engineering, Anna University,
Namakkal, Tamilnadu 637215, India
Data linkage
clustering tree
splitting criteria
pruning
In this paper, we are proposing the novel method to link
the data which does not have the common entity and also
based on the clustering tree. Based on this we can match
the two different cluster of data from the different dataset.
That was the main challenge in the data linkage here we
are applying the one to many data linkage technique with
the one class clustering tree in particularly database misuse
domain. That was takes place by the decision tree
technique. Here each node will be considered as the cluster
of nodes and the whole data on the different dataset will be
matched as the result. Here we attain the improved
efficiency of the data linkage process.
[1] M.Dror, A.Shabtai, L.Rokach, Y. Elovici, “OCCT: A
One-Class Clustering Tree for Implementing One-to-
Many Data Linkage,” IEEE Trans. on Knowledge and
Data Engineering, TKDE-2011-09-0577, 2014.
[2] M.Yakout, A.K.Elmagarmid, H.Elmeleegy, M.Quzzani
and A.Qi, “Behavior Based Record Linkage,” in Proc.
of the VLDB Endowment, vol. 3, no 1-2, pp. 439-448,
2010.
[3] A.J.Storkey, C.K.I.Williams, E.Taylorand R.G.Mann,
“An Expectation Maximisation Algorithm for One-to-
Many Record Linkage,” University of Edinburgh
Informatics Research Report, 2005.
[4] S.Ivie, G.Henry, H.Gatrell and C.Giraud-Carrier, “A
Metric Based Machine Learning Approach to Genea-
Logical Record Linkage,” in Proc. of the 7th Annual
Workshop on Technology for Family History and
Genealogical Research, 2007.
[5] P.Christen and K.Goiser, “Towards Automated Data
Linkage and Deduplication,” Australian National
University, Technical Report, 2005.
[6] P.Langley, Elements of Machine Learning, San Franc-
Isco, Morgan Kaufmann, 1996.
[7] S.Guha, R.Rastogi and K.Shim, “Rock: A Robust
Clustering Algorithm for Categorical Attributes,”
Informat- ion Systems, vol. 25, no. 5, pp. 345-366, July
2000.
[8] D.D.Dorfmann and E.Alf, “Maximum-Likelihood
Estimation of Parameters of Signal-Detection Theory
and Determination of Confidence Intervals-
RatingMethod Data,” Journal of 6, no. 3, pp. 487-496, 1969. [9] A.Gershman et al., “A Decision Tree Based Recommender
System,” in Proc. the 10th Int. Conf. on
Innovative Internet Community Services, pp. 170-179,
2010.
[10] J.R.Quinlan, “Induction of Decision Trees,” Machine
Learning, vol. 1, no. 1, pp. 81-106, March 1986.
[11] C. Li, Y. Zhang, and X. Li, “OcVFDT: One-Class Very
Fast Decision Tree for One-Class Classification of
Data Streams,” in Proc. the 3rd Int. Workshop on
Knowledge Discovery from Sensor Data, pp. 79-86,
Paris, France, 2009.
[12] P.Christen, “A Survey of Indexing Techniques for
Scalable Record Linkage and Deduplication,” IEEE
trans. on knowledge and data engineering,
DOI:10.1109, TKDE.2011.127, 2011.[13] N. Golbandi, Y. Koren, and R. Lempel, “Adaptive
Boot-strapping of Recommender Systems Using
Decision Trees,” in Proc. the 4th ACM Int. Conf. on Web search and data mining, pp.595-604, Honk Kong,
2011.
[14] M. Gafny, A. Shabtai, L. Rokach, and Y. Elovici,
“Detecting Data Misuse By Applying Context-Based
Data Linkage,” in Proc. ACM CCS Workshop on
Insider Threats, Chicago, USA, 2010.
[15] S. Mathew, M. Petropoulos, H. Ngo, S. and
Upadhyaya, “A Data-Centric Approach to Insider
Attack Detection in Data-base Systems,” Recent
Advances in Intrusion Detection, Spring-er, vol. 6307,
pp. 382-401, 2009.